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Abstract. A function J defined on a family C of stationary processes 
is finitely observable if there is a sequence of functions s n such that 
s n (x 1 . . . x n ) — > J(X) in probability for every process X =(x n ) £ C. Re- 
cently, Ornstein and Weiss proved the striking result that if C is the class 
of aperiodic ergodic finite valued processes, then the only finitely observ- 
able isomorphism invariant defined on C is entropy |7j. We sharpen this 
in several ways. Our main result is that if X — » y is a zero-entropy 
extension of finite entropy ergodic systems and C is the family of pro- 
cesses arising from X and y, then every finitely observable function on 
C is constant. This implies Ornstein and Weiss' result, and extends it 
to many other families of processes, e.g. it shows that there are no non- 
trivial finitely observable isomorphism invariants for processes arising 
from Kronecker systems, mild and strong mixing zero entropy systems. 
It also implies that any finitely observable isomorphism invariant de- 
fined on the family of processes arising from irrational rotations must 
be constant for rotations belonging to a set of full Lebesgue measure. 



1. Introduction 

Let ( 

x n) < n=— oo tie an aperiodic ergodic process taking on finitely many 
values; without loss of generality the values are in N. We may assume that 
(x n ) arises from a generating partition V = (Pi) of an aperiodic, invertible 
and ergodic measure preserving system X = (X,B, fJ>,T); the system X is 
unique up to isomorphism. The question we are interested in is: what can 
we learn about the underlying system X by observing a sample path (x n )l 
In principle, the answer is "everything", since by the ergodic theorem 
a typical sample path of (x n )^ =1 determines all finite distributions of the 
process and this determines X up to isomorphism. However a more realistic 
scenario is one in which at each time step another output of the process is 
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revealed, i.e. at time n we have observed the finite sequence x\ . . . x n , and 
are asked to make a guess about the nature of X based on this data. 

We call a scheme for producing such a sequence of guesses an observation 
scheme. To be precise, 

Definition 1.1. An observation scheme (or scheme for short) is a metric 
space A and a sequence of functions s n : N n — > A. An observation scheme 
is said to converge for a family of processes C if lim^-n^ s n {x\ . . . x n ) exists 
in probability for every process (x n ) G C. A function J : C — > A is finitely 
observable if there is an observation scheme (s n ) which converges to J((x n )) 
for every (x n ) G C. 

Note that the larger a family of processes is, the harder it is for a scheme 
to converge for every member of the family, hence large femilies have fewer 
finitely observable functions. 

Nonetheless, many observation schemes (s n ) are known for which the se- 
quence Si(x±), S2(x\,X2), Ss(xi,X2,xs), .... converges in probability or even 
almost surely for every ergodic process (x n ). For example, if s n (x\ . . . x n ) 
counts the frequencies of l's appearing in x\...x n , then by the ergodic 
theorem lim n ^oo s n {x\ . . . x n ) exists a.s. and equals the probability of the 
symbol 1 in the process (x n ). This example and others like it show that 
some things about a process can be calculated from finite observations; but 
these are generally not isomorphism invariants, and so tell us nothing about 
the underlying dynamical system. 

For processes (x n ), (y n ) etc. we denote by X, y respectively the dynamical 
system determined by them. Write (x n ) = (y n ) and X = y to indicate that 
X, y are isomorphic as dynamical systems. We will be interested in families 
of processes C which are closed under isomorphism, that is, they will have 
the property that if (x n ) G C and (y n ) = (x n ) then (y n ) € C. Such a 
family is called saturated. Usually we will specify C by some property of the 
underlying systems, e.g. C might be the family of all processes arising from 
an irrational rotation. In this case we would say for brevity that C is the 
class of irrational rotations. 

Definition 1.2. Let C be a saturated family of processes, A a metric space 
and J : C — > A. Then J is an isomorphism invariant for C (or invariant for 
short) if for every (x n ), (y n ) G C, 

(Xn) = {yn) J({Xn)) = J {(lln)) 
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and J is a complete invariant for C if the reverse implication holds. When 
J is an invariant we write J{X) instead of J{{x n )). 

For quite some time it has been known that the entropy h{{x n )) = h(X) 
of a process is finitely observable in the class of all ergodic processes. The 
earliest observation scheme for entropy is due to D. Bailey [Tl A number of 
simpler schemes have been developed, such as the Lempel-Ziv compression 
algorithm ^2] and the Ornstein- Weiss estimators [El El • 

D. Ornstein and B. Weiss recently proved a striking converse to this: 
Every finitely observable invariant for the class of all ergodic processes is 
a continuous function of entropy [7j. They also showed that there are no 
finitely observable invariants except entropy for any class which contains the 
Bernoulli processes, for the class of zero entropy processes or for the class 
of zero entropy weak mixing processes. 

However their techniques do not settle what is finitely observable in sev- 
eral other interesting classes of systems. Ornstein and Weiss have asked if 
there exists a complete finitely observable invariant for the class of irrational 
rotations (translations by an irrational on the group M/Z); this is not im- 
plausible, since for this class there is a complete invariant for isomorphism, 
namely the spectrum, or equivalently the modulus of rotation (up to sign 
and modi). We remark that there are no known complete invariants in the 
classes for which Ornstein and Weiss showed that entropy is the only invari- 
ant, with the exception of the class of Bernoulli systems, in which entropy 
is itself a complete invariant. 

In an attempt to get a handle on this problem, we came up with the 
following, which is interesting in its own right: 

Theorem. Suppose X — > y is a zero entropy extension of finite entropy 
dynamical systems, that is h{X) = h(y). Let C be the class of processes 
arising from X,y (that is, from generating partitions of X and y). Then 
every finitely observable invariant for C is constant. 

This allows us reclaim the results of Ornstein and Weiss, and to settle the 
following problems: 

Theorem. If J is a finitely observable invariant on one of the following 
classes: 

(1) The Kronecker systems (the class of systems with pure point spec- 
trum) 
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(2) The zero entropy mild mixing processes 

(3) The zero entropy strong mixing processes 

Then J is constant. 

For the class of irrational rotations we obtain a slightly weaker result: 

Theorem. For every finitely observable invariant J on the class of irrational 
rotations, there is a Borel set C [0,1) of full Lebesgue measure such that 
J assigns the same value to processes arising from rotations by angles in 0. 
In particular there is no complete finitely observable invariant for irrational 
rotations. 

The rest of the paper is organized as follows. Section 2 presents some 
definitions and background. In section 3 we prove the theorem about zero- 
entropy extensions. Section 4 contains proofs of the other results, and in 
section 5 we mention some open problems. 

Acknowledgement. This paper was written as part of the authors' Ph.D. 
studies. We would like to thank our advisor Professor Benjamin Weiss for 
his encouragement, support and good advice. 

2. Preliminaries 
For general background on ergodic theory we refer to [31 ^3 . 

2.1. Dynamical systems, partitions and processes. By an aperiodic 
ergodic system X = (X, B, fj,, T) we mean that (X, B, fj,) is a standard prob- 
ability space, T in invertible and acts ergodically, and the set of periodic 
points is of measure zero. A measure preserving systems y = (Y,C, u, S) is 
a factor of the system X = (X, B, //, T) if there is a measure-preserving map 
/ : X — > Y defined almost everywhere satisfying Sf = fT. If there is such 
a map which is also invertible and bi-measurable then X,y are isomorphic. 

A partition V of X is a finite ordered collection of pairwise disjoint mea- 
surable sets (Pi) i=1 whose union is X (up to measure zero). If V, Q are 
partitions of X then the partition V V Q = (Pi n Qj)ti,j) is the join of P, Q 
(order the pairs lexicographically); the join of finitely many partitions 
is defined similarly. Write T n V = (T n Pi). 

A partition V of X generates X if \J'^L_ 00 T n V = B up to measure zero, 
where V^L_oo T n V is the <r-algebra generated by the collection Un V^L-jv T n P- 



Classcsd which cannot be distinguished by finitary invariants 



5 



For a partition P = (Pj)jgN an d w € X we write V{uj) for the index of 
the set in P that contains u>. A partition P determines a stationary ergodic 
process (x n ) with values in N by 

X n (u) = V{T n u) 

We say that Xi(u),Xi + i(uj), . . . ,Xj(uo) is the itinerary of u (with respect to 
P) from time i to time j. The itinerary of to from time to time N — 1 is 
called the (V, N)-name of If P is a generating partition for X then the 
system X and the partition P are determined, up to isomorphism, by the 
process (x n ). We will say this process arises from P if P generates X. 

The space of ordered partitions of X into n sets comes with a metric 
p = p n defined by 

n 

1=1 

for "P = (Pi,...,P„) and Q = (Qi, ■ ■ ■ ,Q n ) (here A denotes symmetric 
difference). The metric p n is complete; note however that if V% — > P in p n 
it may happen that some of the members of P are empty. 

It is easy to check that if p(V, Q) < e then p{\J^ =1 T n V, V^=i ^ n Q) < 
A^e. It follows that if Vk — *• P m P an d (si* (s n ) denote the processes 
arising from V k ,V respectively, then the seuquence of processes (xn ^^L-oo 
converges to {x n )'^L_ 00 in probability. 

Given a partition V of X into r sets and an integer N we may consider 
the distribution that /j, induces on {1, ... , r} N , where the measure of a word 
w G {1, . . . , r}^ is the measure of the set of points whose (P, iV)-name is w, 
or in other words /u(n^ =1 T _n P m ( n )). We refer to this as the distribution of 
iV-names determined by P. 

Since a distribution on iV-names is just a r^-dimensional probability vec- 
tor, we can compare these distributions using e.g. the i x metric. When we 
talk of closeness of iV-name distributions, we will mean it in this sense. Note 
that if P, Q are partitions and p(V, Q) < e then the distance between the 
iV-name distributions associated with P and Q is at most Ne. 



2.2. Entropy. Let X = (X, £>, p, T) be an invertable ergodic measure pre- 
serving system and P = (Pj) a partition. The entropy of a partition P 
is 

H(V) = -Y,KP l )logp(P l ) 
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(all logarithms are to base 2 unless specified otherwise). H(V) is non- 
negative and finite (define log = 0). The entropy of the system X with 
respect to V (equivalently, the entropy of the process arising from V) is 

h(X,V) = lim -H(V V TV V ... V T n_1 'P) 
n— »oc n 

the limit above can be shown to exist. The entropy of X is 

h(X) = sup{h(X, V) : V a finite partition of X} 

If V is a finite generating partition then h(X) = h(X,V), but the relation 
h(X) = h(X,V) is not in itself enough to guarantee that V generates. How- 
ever the Krieger generator theorem |Hj guarentees that if h(X ) < log k for 
an integer k then there exists a generating partition V = (Pi, . . . , P&) of X 
into /c sets. 

In the space of partitions of X into n sets, the entropy is continuous in 
the metric p n : that is, for a partition V, for every 5 > there is an e > 
such that if p(V, Q) < 5 then \h{X,V) - h{X, Q)\ < e. 

The main fact about entropy we will use is the following classical theorem: 

Theorem 2.1. ( Shannon- McMillan- Breiman theorem) For any finite par- 
tition V of X and almost every x € X , 
1 n— 1 

-\ogp{[^V[T l x))^h{X,V) 

' ' 8=0 

A proof can be found in ^U] p. 55. 
Denote 

n(u) = p({x € X : the (P, n)-name of x is it}) 
With this notation the Shannon-McMillan-Breiman theorem states that 

- log n(xi . . . x n ) -> /i(Af, T 3 ) 
n 

almost surely, where (x n ) is the process arising from V. 

Also, for partitions V, Q and (u, v) € N n x N n , we say that (u, v) is the 
(T 3 x Q, n) name of a point u G X if u is the (T 7 , n)-name of u; and u is the 
(Q> n)-name of w. This is just another way of talking about the partition 
V V Q. Denote 

w({a; £ X : the (P x Q, n)-name of x is (u, u)}) 
/x({x € X : the (V, n)-name of x is u}) 

We will actually use the following "relative" version of the Shannon-McMillan- 
Breimann theorem: 
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Theorem 2.2. (Relative Shannon-McMillan-Breimann) Let V, Q be parti- 
tions of X with entropies h(X,V) = s <t = h(X, Q). For every e > there 
are collections of words A n C N n x N n for n = 1, 2, 3 . . . such that 

(1) #{u G N™ : (u,v) G A n /or some v} < 2 (s+£ ) ra for every n. 

(2) #{v G N n : G A n } < 2(*- s+£ ) n /or every n. 

(3) For almost every point x G X the (V x Q,n)-name of x is in A n for 
all sufficiently large n. 

Proof. Define 

A n = {(u,v) G N n x N n : /*(«) > 2~ {s+e)n and fi(v\u) > 2^ t ~ s+£)n } 

The fact that for almost every x G X the (T 3 x Q, n)-name of x is eventually 
in A n follows from the Shannon-McMillan-Breimann theorem, once applied 
to the partition V and once to the partition V x Q. The estimimates on the 
size of the n's represented in A n and the u's associated to a given u in ^4 n 
follow easily from the definition since the mass of the n's and the mass of 
the u's relative to a given u must add to at most 1. □ 

2.3. Towers. A tower of height n in X is a set of the form BUTBU T 2 B U 
. . .UT n ~ 1 B C X such that the sets T l B are measurable and pairwise disjoint 
for i = 0, . . . , n — 1. The set 5 is called the 6ase of the tower, and the set 
T % B is called the i-th level of the tower. 

Given a partition V = (Pi) and a tower U™T 1 T l i?, we can partition the 
base B into disjoint (possibly empty) sets B w indexed by words w G N™, 
such that 

B u = {w68 : u is the (V, n) — name of ui} 

This partitions the tower into disjoint subtowers U™~qT 1 B u whose base is B u ; 
these subtowers are called columns. Each level T l B u is contained entirely 
in the element P u ^ of V. Put another way, if (x n ) is the process associated 
with V then for uj G B u the first n outputs (xi(uj), . . . , x n (ui)) of the process 
are equal to u = (u±, . . . , u n ). 
We will need two tower lemmas. 

Lemma 2.3. (Kakutani towers lemma) Let B be a set of positive measure 
and N an integer. Then the space X can be partitioned into countably many 
pairwise disjoint towers all of height no less than N , all of whose bases are 
subsets of B. 
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Proof. Since X is aperiodic we can choose a set B' C B of positive measure 
such that if x G £?' then T*x ^ 5' for 1 < i < N. Partition the base B 
according to the first return time to B' , ie let 

g(n) _ | x g . n j s ^.^g £ rs ^. pog^ivg integer such that T n x € -B'} 

Then for each n > N we have a tower P» U TP» U . . . U r^^W, these 
towers are pairwise disjoint, and their union fills X. □ 

A stonger result is a version of the Rohlin lemmma whose proof can be 
found in |H] 

Lemma 2.4. (Strong Rohlin lemma) Let V = {Pi, . . . , P&} 6e a 'partition of 
X and e > 0. T/ien /or every N there is a tower B U TB U . . . U T N ~ l B of 
height N whose complement is of measure at most e and such that the parti- 
tion Q = {B DPi, . . . , B H Pfc} induced on B by V has the same distribution 
relative to B as V has relative to X . 

Corollary 2.5. Givev A C X with fJ-(A) > 1 — e and any N , there is a 
tower B U TB U . . . U T N ~ 1 B in X filling all but 2e of the space and with 
B CA. 

Proof. Let CUTCU. . .UT N ~ 1 C be the tower provided by the strong Rohlin 
lemma with respect to the partition {^4, X \ A} and set B = C D A. □ 

2.4. Approximation methods for partitions. Often a generating parti- 
tion with some property is constructed by approximation, that is, a sequence 
of partitions is defined satisfying more and more of our requirements and 
which converge in p to a partition with the properties we want. Below we 
outline some of the tools we use for such constructions. 

If A is a partition or a algebra of measurable sets and B is a measurable 
set then we write B C e A to indicate that there is a set A £ A such that 
fi(AAB) < e. Clearly B £ A (up to measure zero) iff B C e A for every 
e > 0. For a partition V we write V ^ £ A if Pi C £ A for every Pi € V. 

Let V be a generating partition for X and suppose that Q is a partition 
such that, for every e > 0, there is an such that V Q £ V^=-at T n Q. 
It follows that P C \J'^' = _ 00 T n Q, and since \/ c ^ = _ OQ T n Q, is T-invariant, 
B = Vn=-oo T "^ ^ V^ = _ooP n Q- Thus Q generates. 

Suppose V, Q are partitions of X into n elements and A C £ V. Then if 
p(P, Q) < S we have A C £+5 Q. Thus if A C £ V„=l and /°(^> 2) < 5 
then AC e+m \/ N n=l T n Q. 
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These observastions are essentially the proof of the following lemma, see 
also [5] p. 79: 

Lemma 2.6. Let (Vk)t*Li be a sequence of partitions of X and Q a partition 
of X. Suppose that p(Vk-l,Vk) < and Q C e(fe ) \l^ k } N ^ k) T^' j T J k for 
some sequences e{k) > and N(k) E N which satisfy Ylk=i £ (k) < 00 an d 
N(k) ■ X}j=fc+i e C?) — > as A; — > oo. Then (Vk) converges to a partition V 
andQQ V£Loc T ~ ip - 

The following theorem shows that in order to change a partition V into 
a generating partition, you need to perturb V by an amount of the same 
order as the difference h(X) — h(V). This result is not new but we include 
a proof for completeness. 

Theorem 2.7. (Entropy and generating partitions) let h > and k be 
an integer with log A; > h. Let X = (X,B, p,T) be an aperiodic ergodic 
system with entropy h and let V = (Pi, . . . , P&) be a partition of X with 
h(X,V) = h! (so h! < h). Then for every 5 > there is a generating parti- 
tion V' = (P{, ■ ■ ■ , Pk) of X such that p(V, V) < 5 + ^~ k h l h ■ In particular, 
the generating partitions are dense in the p-metric among the partitions of 
maximal entropy. 

Remark. The parameter 8 was introduced only in order to deal with the 
case that h = h! . The fact that the generating partitions are dense among 
the partitions of maximal entropy is known, but we are unable to find a 
reference. 

Proof. Let 5 > be given. Fix a very small e > which will determined 
later. Fix a generating partition Q of size k, and for n = 1,2,3... let 
A n C N n x N™ be as in theorem 12.21 for the partitions V, Q and parameter 
e. Let N > - be large enough that the the set Xq of w's whose (V x Q, re- 
name in A n for all n > iV has positive measure. Applying lemma 12.31 we 
can partition the space X into disjoint towers of height at lease ^ whose 
bases are contained in Xq, that is for each n > ^ we g e t disjoint towers 
BWUTBWU...U jm-lg(n) of height n with B (n) ^ and the union of 

these towers has full measure. Partition the bases according to A n , so 
for a word (u, v) G A n the set consists of points whose (V x Q, n)-name 
is (u, v). 

We construct a partition V' by modifying the labels of some levels of the 

(n) 

columns Bu^. The construction proceeds in three stages. 
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Marking the base: Fix m = - (for simplicity we ignore rounding 
errors and treat m as an integer, and adopt a similar philosophy later 
as well). Label the lower 2m levels of the column B$ (i.e. the levels 
indexed to 2m — 1) with l's and mark levels 2m, 3m, . . . , [n/m]m 
with 0's. 

The result of this procedure is that given any point uj G U^ZqT 1 B^ 
the base of the column can be identified as the largest index i G 
{— n, —n + 1, . . . , 0} such that the (V' , 2m)-name of T l uj consists off 
all l's. Thus given the V itinerary of u> from time — n to n, we can 
reconstruct the P-name of the column to which u> belongs. We will 
preserve this property in the following steps, hence with probability 
1 given the V 1 itinerary of a point from time — oo to — oo we can 
determine the n corresponding to the column the point belongs to, 
and the P'-name of that column. 
Coding the Q-itinerary into V': Denote A n (u) = {v : (u,v) G 
A n } C N n . Fix (u,v) G A n and enumerate A n (v) = {v±, . . . ,v r } 
in a way depending only on u; by assumption |^4 n (u)| < 2^ h ~ h +e ) n . 

(n) 

We modify the column over Bu,& so as to record the index i for which 
v = v j. We do this by writing the base- A; representation of i near the 
bottom of the column. To be precise, we record the base-k digits of i 
starting at level 2m + 1 and writing consecutively in blocks of m — 1, 
skipping levels of height mod m so as not to overwrite what we did 
in the previous stage. Since there are at most 

2(h-h'+e)n possible 

values for i we need to overwrite n(h — h' + e) log fc 2 levels of the 
column. 

The result of this procedure is that if we know both the (V, n)- 
name (the word u) and the (V' , n)-name of a point in the base B^ n \ 
we can deduce its (Q, n)-name (the word v) by extracting the index 
i coded just above the base marker in the (V',n) name, and looking 
at the i-th word in the list A n (u). 
Re-coding the "P-itinerary: Fix again (u,v) G A n . The "P-name of 

(n) 

the column Bu,v has been partly destroyed by the previous steps. We 
will fix this by overwriting still more of the "P-name, starting where 
we stopped at the previous stage, skipping levels which are at height 
mod m, and stopping at some height M = M(n) which we will 
determine. This gives us M — (2m + ^ + n{h — b! + e) log fc 2) symbols 
in which to store information. In this space we want to record the 
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portion of the name u which has been overwritten in all three stages 

(including the current stage). This consists of the first M symbols 

of u plus at most ^ additional levels overwritten in the first stage. 

Assuming as we may that M > en > N, we know that the number 

of possibilities for the first M symbols of u is bounded by 2^ h ' +£ ^ M so 

using the k symbols at our disposal we need M{h' + e) log fc 2 symbols 

in order to record it, plus another ^ symbols to record what was 

erased in the first stage. Thus we require of M that in addition to 

en < M < n it satisfy the inequality 

n n 
M - (2m + — + n(h - ti + e) log fe 2) > M(ti + e) log fe 2 + — 

or equivalently 

M> {(h-h' + e)\o gk 2 + 2(± + ^))n 
l-{h' + e)\og k 2 

Since h! < h < log k, = en and m = = ^-n < < en, when 

lit It oft ly 

e is small enough it suffices that 

> ((h-h' + e)\og k 2 + te) 
l-(h' + e)\og k 2 

Denote the coefficiant of n in expression on the right hand side by 
C(e). Note that C(e) -> ^g'k-h' as e ^ and < C{e) < 1. Thus if 
we choose e > small enough (in a manner depending only on h, h! 
and k) we can set M = max{e, C(e)} ■ n and M will satisfy all the 
requirements, including en < M < n. 

The results of this procedure is that given the (V , n)-name of a 

(n) 

point in the base of the tower column Bu,v, we can reconstruct its 
(V, n)-name by looking at the data written in this step, and hence 
by the previous step its (Q,n) name. Together with the previous 
stages, this means that for any point in X if we know the entire V 
itinerary we know can determine the column it is in and the V' of 
that column, and hence its Q(uj). This means that V' generates. 



It remains to estimate how much V has changed. We have modified M + ^ 
levels of each column B^, or a (C(e) + ^-fraction of the mass of that col- 
umn, summing over all columns, this is the fraction of X that has changed. 
For e > sufficiently small, this is less than 5 + ] ^^ h , , implying that 
p(V, V) < 5 + lo ^k-h' ■ This completes the proof. □ 
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3. Zero-entropy extensions 



This section is dedicated to proving our main theorem, theorem 13.11 Be- 
fore going into the details, we would like to say a few words about the 
relation of this theorem to the work of Ornstein and Weiss in [7], where 
it was shown that entropy is the only finitely observable invariant in some 
classes saturated of processes. Their proof used a diagonalization argument: 
Assuming to the contrary that for some class C there exists a finitely ob- 
servable invariant finer than entropy, choose two non-isomorphic processes 
(x n ), (y n ) € C with the same entropy h. A third process (z n ) is then con- 
structed, for which the observation scheme does not converge. This is done 
by inductively defining the iV-block distributions for the process (z n ) for a 
sequence of rapidly increasing JV's, where at each step Rohlin towers and 
copying lemmas are used to make (z n ) look at different time scales as though 
it comes from X or y. However, in order to obtain a contradiction it must 
be ensured that (z n ) G C, since otherwise the observation scheme is not ex- 
pected to converge. With some care one can ensure that (z n ) is Bernoulli if 
h > 0, or weak mixing and deterministic if h = 0, but other properties, such 
as pure point spectrum or non-Bernoulliism in positive entropy, are harder 
to build into (z n ). 

Our results derive from the observation that when (x n ) is a zero-entropy 
extension of (y n ), one can control the isomorphism class of the diagonal 
process (z n ) and in fact it can be made isomorphic to (y n ). 

Theorem 3.1. Suppose X ^ y is a zero entropy extension of finite entropy 
dynamical systems. Let C be the family of processes arising from X and y. 
Then every finitely observable invariant for C is constant. 

Proof. We identify y with the sub-<r-algebra of X which is the pull-back of 
the cr-algebra of y through the factor map. Let r E N with logr > h{X)\ 
all partitions in the sequel are partitions into r sets. 

To simplify notation we assume that (s n ) is an observation scheme whose 
range is M; there is no loss of generality here since given some other range we 
can always compose with continuous functions from the range to R. Suppose 
that there are £, n € A such that for every pair of processes (x n ), (y n ) arising 
from X , y respectively and generating them, 



lims n (xi ...x n ) 
lims„(yi ...y„) 



£ in probability 



rj in probability 
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We must show that 77 = £. In order to do this will construct a generating 
partition V* of y and a sequence N(k) such that Sjv(fc)(y* • • • 2/w(fc)) "~ * £ m 
probability (here (y*) is the process arising from T 3 *). This suffices because 
by assumption, lim s n (y* . . . y*) — ► 7/, so 77 = £. 

The partition V* will be obtained as the limit of a sequence of generating 
partitions of y, which will be constructed inductively. The induction 
step is provided by the following lemma: 

Lemma 3.2. For any generating partition V of y, and any e > 0, there is 
a generating partition V of y with p(V, P) < e, and an integer N so that 

P(\s N (y 1 ...y N )-£\<e) >l-e 

where (y n ) is the process arising from V. 

Before proving the lemma let us show how it is used to prove the theorem. 
We construct a sequence of generating partitions of y and asssociated 
processes (yi^), starting with an arbitrary generating partition pro- 
vided by the Krieger generator theorem. 

At the induction step, given we construct using the lemma; 

we choose the parameter e = e(k) < 1/k in the lemma to be very small with 
respect to the previous stages of the construction (see below). Thus we have 

(3.1) p{V [k - l) ,V {k) ) <e{k) 
From the lemma we also get an integer N(k) such that 

(3.2) P(\sN {k) (y[ k) ...y^ k) )-^\<l)>i-l 

and since generates y there is an integer L(k) such that 

L{k) 

(3.3) P (0) C 1/fc \/ TV^ 

i=-L(k) 

During the construction we are free to choose the e(k) as small as we 
like. First of all we will choose them so that ^2,£{k) < 00. Since the metric 
p = p r is complete (or using the Borel-Cantelli lemma) this guarantees that 
-p( k ) converges to a partition V* of y, with associated process (y^)- Second, 
note that p(V* ,V^ k ~ 1 ^) < ^™ =J , e(m). Thus at the beginning of step k 
of the construction, when "p( fc_1 ) is given, we may choose a 5 = 5(k) > 
depending on all the data defined so far and prescribe that p(V* , T 7 ^" 1 )) < 
5(k) by requiring e(m) < 2~ m 5(k) for every m > k. The point is that 
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the conditions (|^.2j) and (j^J-ij) remain true for any partition (and associated 
process) sufficiently close to pw, and hence a prudent choice of 5(k) implies 
that they hold for V* and (y*), that is, 

Vm P(\s N{m) (yt...y* N{m) )-Z\ < — ) > 1 - — 

and 

L(fc) 

VA; P (0) c 1/fe \/ T l P* 

j=-L(fc) 

The first of these implies lim^oo s N ^(y^ . . . y^^)) = £ m probability, and 
the second that C VSLoo 1 ^*' so ^* generates ^. □ 

Proof, (of lemma ESI We first present a sketch of the proof, and afterwards 
the details. Since V generates y it has full entropy, which by assumption is 
equal to the entropy of X . Therefore we can find a generating partition Q 
for X with p(V, Q) < e/2. Let (x n ) be the process determined by Q; then 
s n (x\ . . . x n ) — > ^ in probability, so we can choose an N such that 

P(\s N (xi . . . x N ) - C| < e) > 1 - £ 

Since V, Q are both defined on X we get a joining of the V- and Q-processes. 
Choose now a 5 > and a suitably large K. Now working in 3^ again, we can 
construct a partition 7Z whose joint fT-block distribution with V is within 
5 of the joint A'-block distribution of V, Q. Thus (assuming we chose K 
large enough), the order of magnitude of p(V,lZ) will be of the order of 
p{P, Q) + S, the iV-block distribution of the 7£-process will be within 5 of 
the iV-block distribution of the Q-process, and the entropy 1Z is <5-close to 
h(y). Thus although 1Z doesn't necessarily generate y we need only make 
an additional small correction to get a generating partition V for y, and we 
can arrange that this doesn't disturb the iV-block distributions very much. 
Now for the details: 

Choosing Q: Since h{X,V) = h(y) = h(X), by theorem 12.71 we can 
find a generating partition Q for X with 

p(T, G)<f 

Choosing N and 5: Denote by (x n ) the process arising from Q. Then 
s n (xi . . . x n ) — > £ in probability, so there is an integer N such that 

/j,(\s N (xi . . . x N ) - C| < e) > 1 - e 
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Note that condition above is a property of the iV-block distribution 
of (x n ). Thus there is a 5 G (0, |) with the property that if (z n ) is 
a process arising from a partition TZ and the iV-block distribution 
induced by TZ is within 5 in L 1 of the iV-block distribution of Q, then 
[i(\sn(zi . . . zn) — £| < e) > 1 — e. Note also that if TZ, TZ' are two 
partitions of y and if p(TZ, TZ') < 8/N then the iV-bock distributions 
of the processes arising from TZ, TZ' differ by at most 8. 
Choosing a, j3 and M: Invoking theorem 12. 71 choose a > such that 
if TZ is a partition of y with entropy h — a then there is a generating 
partition TZ' of y with p(TZ,TZ') < 5/2N. Let (3 > be such that for 
any partition S of y, HVQpS then h(S) > h — a. We may assume 
that < 8/N. 

Since Q generates X and "P is measurable in X there is an M > N 
such that 

M 

V TiQ 

i=-M 

Note that this property depends only on the distribution of (V x 
Q, 2M + l)-names, and if TZ is a partition of y such that the dis- 
tribution of (V x Q, 2M + l)-names is within r of the distribution 
(P x ^, 2Af + l)-names (in £ 1 (R 2Af+1 )) then P C /J/2+T \/fi_ M T { TZ. 
Choosing L,B and TZ: Fix an integer L with max{M, N}/L < (3/8 
and choose a tower 2? U TB U . . . U T L B of height L in filling all 
but /3/4 of the space. We will define a partition TZ of 3^ by modifying 
V at some of the points in the tower. 

Let (B u ) be the partition of the base B according to ("P, L)-names. 
This partition is measurable in y. We can further partition each 
B u according to the (Q,L)-names as B u = \J V B U)V . The B^s are 
measurable in X but may not be measurable in y. However since y 
is non-atomic we can partition the sets B u into sets B' u v in y such 
that fJ.(B' u v ) = p:(B u>v ). For each B' u v , modify the column over B' u v 
so that it is labeled by v (instead of u). Call the resulting partition 
TZ. 

Since 

p(V,TZ) = 2fx({x G X : V(x) ± TZ{x)}) 
and on the tower U^~qT % B we have 
V{x G Uf^rs : V(x) ± TZ(x)} = p{x G U^fS : P(x) ^ Q(x)} 
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and the tower fills all but /3/4 of the mass, it follows that 

Choosing V: Consider now the difference between the distributions 
of (V x Q, 2M + l)-names and the distributions of (V x1Z,2M + 1)- 
names. The only difference between them is incurred at the top and 
bottom M levels of the tower, which have total mass < 2M/L < (3/4, 
and the exceptional set outside the tower whose mass is < /5/4. 
Therefore the distributions of (V x Q, 2M + 1)- and (V xTZ,2M + 1)- 
names differ by at most r = (5/2 so 

M 

i=-M 

Since the entropy of Vi=-Af T 1 1Z is the same as the entropy of 7Z, 
we conclude by the choice of (3 that 1Z has entropy > h — a. We can 
therefore choose a generating partition V of y with p(V, 1Z) < 5/2N. 
We conclude that 

p(V,V) < p(V,H) + p(K,V) < | + | + — <e 

Finally, note that from the construction of 1Z, the iV-block distri- 
bution is the same as the iV-block distribution of Q except for an 
error introduced by the top N levels of the tower, which have mass 
< /?/4, and the exceptional set also of measure /3/4, which means 
that the iV-block distribution of 1Z and Q differ by less than 5/2. 
Since p(lZ,V) < 5/2N we see that the iV-block distributions of the 
7£-process and the P-process differ by at most 5/2, so the iV-block 
distributions of the P-process and the Q-process differ by at most 
5; by the definition of 5 this implies 

K|sjv(I7i • • -Vn) ~ f I < e ) > 1 - £ 
where (y n ) is the process defined by V. 
This completes the proof. □ 

4. Some Applications 
An immediate consequence of theorem 13. II is: 



Classesd which cannot be distinguished by finitary invariants 



17 



Proposition 4.1. Let C be a saturated class of processes with entropy h. 
Suppose that every X,y € C either have a common factor or a common 
extension in C. Then every finitely observable invariant is constant on C. 

Proof. If X, y have a common factor Z, then no scheme can distinguish X 
and and no scheme can distinguish y and Z; so every scheme must give 
the same value to X and y. The case of a common extension is similar. □ 

We turn now to some specific classes of processes. We begin by recovering 
some of the results of [7] using the techniques of the last section. 

Proposition 4.2. (^\) There are no nontrivial finitely observable invariants 
for the class of zero entropy systems or for the class of zero entropy weakly 
mixing processes. 

Proof. Any zero-entropy ergodic systems X , y have an ergodic zero entropy 
joining (take a typical ergodic component of X x y), and if X,y are zero 
entropy weakly mixing systems then so is the joining X x y. □ 

Proposition 4.3. IfC is a saturated family of processes which contains 
the Bernoulli processes (eg C =all aperiodic finite valued ergodic processes) 
then entropy is the only finitely observable invariant. 

Proof. For h > let C h = {X £ C : h(X) = h}. We must show that 
every finitely observable invariant scheme on C is constant on each Ch- For 
h = this is the previous proposition. For h > 0, we use Sinai's theorem, 
which states that every X, y G Ch has a Bernoulli factor with entropy h. 
By Ornsteins isomorphism theorem, these factors are isomorphic. Since the 
Bernouli processes are in C we conclude that every X,y € Ch have a common 
factor in Ch, so every scheme is constant on Ch- O 

Now for something new: 

Theorem 4.4. (1) Every finitely observable invariant for the class of 
Kronecker systems is constant 

(2) Every finitely observable invariant for the class of mildly mixing zero 
entropy systems is constant. 

(3) Every finitely observable invariant for the class of strong mixing zero 
entropy systems is constant. 



Proof. Again, we need only note that in these classes every two systems have 
a joining in the same class. □ 



18 



Y. Gutman and M. Hochman 



An elementary class of systems is the class 1Z of irrational rotations. A 
delicate and perplexing question is whether there exist nonconstant finitely 
observable invariants on this class. 

To fix notation, let ([0, 1), B, A) be the probability space of the unit inter- 
val with lebesgue measure. For a £ [0, 1) \Q let X a = ([0, 1), B, A, T a ) where 
T a : [0, 1) — ► [0, 1) is translation by a, that is, T a (x) = x + a(modl). Let 
TZ = U{X a : a £ [0, 1) \ Q} be these systems (note that X a ^ X_ a ). Thus 
an invariant J : TZ — > A induces a map J : [0, 1) \ Q — ► A by J (a) = J(X a ). 

Lemma 4.5. If J is a finitely observable invariant on TZ then J is Lebesgue 
measurable. 

Proof. We may assume that A = R by composing continuous real-valued 
functions on s n . Let (s n ) be an observation scheme which calculates J. Fix 
the partition V = ([0, \), [^, 1)) of the interval into two equal halves, and 
note that V generates for every X a <G 1Z. Thus denoting by (x^) the process 
arising from V and the system X a , we have 

J (a) = J(X a ) = lim s n (x { "\. . . ,4°)) 

n— +oo 

where the limit exists in probability and is constant A-a.e. in X a . 
Define /„ : [0, 1) x [0, 1) -> A by 

fn(a,u) = s n (x[ a \uj), . . . ,x^\u)) 

and / : [0, 1) x [0, 1) - A by 

f(a,y) = J(a) 

To show that J is measurable it suffices to show that / is measurable. 
And in fact, the /„ are measurable with respect to the product a-algebra and 
since /„ converges in probability on every fibre {a} x [0, 1) (with respect to 
A), and the limit is the constant function J (a), it follows that f n converges 
to / in probability on [0, 1) x [0, 1) with respect to A x A. □ 

Theorem 4.6. Let J : R — > A be a finitely observable invariant for 1Z. Then 
J is constant on a set of full measure. In particular, no finitely observable 
invariant on 7Z is complete. 

Proof, li a, (3 G [0, 1) \ Q are rationally dependent then 7 = ma = n(3 £ 
R \ Q for some m,n € N. Thus TZ-y is a factor both of lZ a and of TZp, so 
J(R a ) = J(R/3). We conclude that J is a Lebesgue-measurable function on 
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[0, 1) \ Q which is constant on Q-cosets. Any such map is constant on a set 
of full measure. □ 

5. Remarks and probems 
Let us mention two problems which we have not been able to resolve: 

Question. Let 1Z denote as before the class of irrational rotations. Is every 
finitely observable scheme on 1Z constant? 

Question. Let /C be the class of non-Bernoulli K-processes. Are there any 
finitely observable invariants on fC finer than entropy? 

It has been known for some time that there are no complete Borel invari- 
ants on K, (the Boral structure comes from one of the natural topologies on 
K, - see Feldman's paper It also follows from work of Hoffman [I] that 
there exist non- isomorphic if-systems X,y of the same entropy such that 
X — > y is an extension. This implies by proposition 14.11 that there are no 
complete finitely observable invariants on /C; but this is not new in view of 
Feldman's work. 

If it were true that every two processes X, y £ K, had a common zero- 
entropy non-Bernoulli X-extension then proposition 14.11 would imply that 
there are no finitely observable invariants but entropy on /C. However, the 
existence of such a joining is an open problem. 
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